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Abstract 

Haskell is a great language for writing and supporting embedded 
Domain Specific Languages (DSLs). Some form of observable 
sharing is often a critical capability for allowing so-called deep 
DSLs to be compiled and processed. In this paper, we describe and 
explore uses of an 10 function for reification which allows direct 
observation of sharing. 

Categories and Subject Descriptors D.3.3 [Programming Lan- 
guages]: Language Constructs and Features — Data types and struc- 



General Terms Design, Languages 

Keywords Observable Sharing, DSL Compilation 



1. Introduction 

Haskell is a great host language for writing Domain Specific Lan- 
guages (DSLs). There is a large body of literature and community 
know-how on embedding languages inside functional languages, 
including shallow embedded DSLs, which act directly on a prin- 
cipal type or types, and deep embedded DSLs, which construct an 
abstract syntax tree that is later evaluated. Both of these methodolo- 
gies offer advantages over directly parsing and compiling (or inter- 
preting) a small language. There is, however, a capability gap be- 
tween a deep DSL and compiled DSL, including observable sharing 
of syntax trees. This sharing can notate the sharing of computed re- 
sults, as well as also notating loops in computations. Observing this 
sharing can be critical to the successful compilation of our DSLs, 
but breaks a central tenet of pure functional programing: referential 
transparency. 

In this paper, we introduce a new, retrospectively obvious way of 
adding observable sharing to Haskell, and illustrate its use on a 
number of small case studies. The addition makes nominal impact 
on an abstract language syntax tree; the tree itself remains a purely 
functional value, and the shape of this tree guides the structure of 



a graph representation in a direct and principled way. The solution 
makes good use of constructor classes and type families to provide 
a type-safe graph detection mechanism. 

Any direct solution to observable sharing, by definition, will break 
referential transparency. We restrict our sharing using the class type 
system to specific types, and argue that we provide a reasonable 
compromise to this deficiency. Furthermore, because we observe 
sharing on regular Haskell structures, we can write, reason about, 
and invoke pure functions with the same abstract syntaxes sans 
observable sharing. 

2. Observable Sharing and Domain Specific 
Languages 

At the University of Kansas, we are using Haskell to explore the 
description of hardware and system level concerns in a way that is 
suitable for processing and extracting properties. As an example, 
consider a simple description of a bit-level parity checker. 




output 



This circuit takes a stream of (clocked) bits, and does a parity count 
of all the bits, using a bit register. Given some Haskell functions 
as our primitives, we can describe this circuit in a similar fashion 
to Lava (Bjesse et al. 1998), Hawk (Matthews et al. 1998), and 
Hydra (O'Donnell 2002). For example, the primitives may take the 



— DSL primitives 

xor : : Bit -> Bit -> Bit 

delay : : Bit -> Bit 
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where xor is a function which takes two arguments of the abstract 
type Bit, performing a bit-wise xor operation, and delay takes 
a single Bit argument, and outputs the bit value on the previous 
clock cycle (via a register or latch). Jointly these primitives provide 
an interface to a ^Lava. 
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These abstract primitives allow for a concise specification of our 
circuits using the following Haskell. 

— Parity specification 
parity : : Bit -> Bit 
parity input = output 

output = xor (delay output) input 

We can describe our primitives using a shallow DSL, where Bit 
is a stream of boolean values, and xor and delay act directly on 
values of type Bit to generate a new value, also of type Bit. 

— Shallow embedding 
newtype Bit = Bit [Bool] 

xor : : Bit -> Bit -> Bit 

xor (Bit xs) (Bit ys) = Bit $ zipWith (/=) xs ys 

delay : : Bit -> Bit 

delay (Bit xs) = Bit $ False : xs 

run : : (Bit -> Bit) -> [Bool] -> [Bool] 
run f bs = rs 

(Bit rs) = f (Bit bs) 

Hawk used a similar shallow embedding to provide semantics for 
its primitives, which could be simulated, but the meaning of a spe- 
cific circuit could not be directly extracted. In order to construct 
a DSL that allows extraction, we can give our primitives an alter- 
native deep embedding. In a deep embedding, primitives are sim- 
ply Haskell data constructors, and a circuit description becomes a 
Haskell syntax tree. 

— New, deep embedding 
data Bit = Xor Bit Bit 

I Delay Bit 
I Input [Bool] 
I Var String 
deriving Show 

xor = Xor 
delay = Delay 

run : : (Bit -> Bit) -> [Bool] -> [Bool] 
run f bs = interp (f (Input bs)) 

interp : : Bit -> [Bool] 

interp (Xor bl b2) = zipWith (/=) (interp bl) 
(interp b2) 
interp (Delay b) = False : interp b 
interp (Input bs) = bs 

interp (Var v) = error $ "Var not supported" 

The run function has the same behavior as the run in the shallow 
DSL, but has a different implementation. An interpreter function 
acts as a supporting literal interpreter of the Bit data structure. 

> run parity (cycle True) 
[True, False, True, False .True, . . . 

The advantage of a deep embedding over a shallow embedding 
is that a deep embedding can be extracted directly for process- 
ing and analysis by other functions and tools, simply by reading 



the data type which encodes the DSL. Our circuit is a function, 
Bit -> Bit, so we provided the argument (Var "x") , where "x" 
is unique to this circuit, giving us a Bit, with the Var being a place- 
holder for the argument. 

Unfortunately, if we consider the structure of parity, it contains a 
loop, introduced via the output binding being used as an argument 
to delay when defining output. 

> parity (Var "x") 

Xor (Delay (Xor (Delay (Xor (Delay (Xor ( . . . 

This looping structure can be used for interpretation, but not for fur- 
ther analysis, pretty printing, or general processing. The challenge 
here, and the subject of this paper, is how to allow trees extracted 
from Haskell hosted deep DSLs to have observable back-edges, or 
more generally, observable sharing. This a well-understood prob- 
lem, with a number of standard solutions. 

• Cycles can be outlawed in the DSL, and instead be encoded 
inside explicit looping constructors, which include, implicitly, 
the back edge. These combinators take and return functions that 
operate over circuits. This was the approach taken by Sharp 
(2002). Unfortunately, using these combinators is cumbersome 
in practice, forcing a specific style of DSL idiom for all loops. 
This is the direct analog of programing recursion in Haskell 
using fix. 

• Explicit Labels can be used to allow later recovery of a graph 
structure, as proposed by O'Donnell (1992). This means pass- 
ing an explicit name supply for unique names, or relying on the 
user to supply them; neither are ideal and both obfuscate the 
essence of the code expressed by the DSL. 

• Monads, or other categorical structures, can be used to generate 
unique labels implicitly, or capture a graph structure as a net-list 
directly. This is the solution used in the early Lava implementa- 
tions (Bjesse et al. 1998), and continued in Xilinx Lava (Singh 
and James-Roxby 2001). It is also the solution used by Baars 
and Swierstra (2004), where they use applicative functors rather 
than monads. Using categorical structures directly impacts the 
type of a circuit, and our parity function would now be required 
to have the type 

parity : : Bit -> M Bit 

Tying the knot of the back edges can no longer be performed 
using the Haskell where clause, but instead the non-standard 
recursive-do mechanism (Erkok and Launchbury 2002) is used. 

• References can be provided as a non-conservative exten- 
sion (Claessen and Sands 1999). This is the approach taken 
by Chalmers Lava, where a new type Ref is added, and pointer 
equality over Ref is possible. This non-conservative extension 
is not to everyone's taste, but does neatly solve the problem of 
observable sharing. Chalmers Lava's principal structure con- 
tains a Ref at every node. 

In this paper, we advocate another approach to the problem of 
observable sharing, namely an 10 function that can observe sharing 
directly. Specifically, this paper makes the following contributions. 

• We present an alternative method of observable sharing, using 
stable names and the 10 monad. Surprisingly, it turns out that 
our graph reification function can be written as a reusable com- 
ponent in a small number of lines of Haskell. Furthermore, our 
solution to observable sharing may be more palatable to the 
community than the Ref type, given we accept IO functions 
routinely. 
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• We make use of type functions (Chakravarty et al. 2005), a 
recent addition to the Haskell programmers' portfolio of tricks, 
and therefore act as a witness to the usefulness of this new 

• We illustrate our observable sharing library using a small num- 
ber of examples including digital circuits and state diagrams. 

• We extend our single type solution to handle Haskell trees 
containing different types of nodes. This extension critically 
depends on the design decision to use type families to denote 
that differently typed nodes map to a shared type of graph node. 

• We illustrate this extension being used to capture deep DSLs 
containing functions, as well as data structures, considerably 
extending the capturing potential of our reify function. 

Our solution is built on the StableName extension in GHC (Peyton 
Jones et al. 1999), which allows for a specific type of pointer 
equality. The correctness and predicability of our solution depends 
on the properties of the StableName implementation, a point we 
return to in section 12. 



3. Representing Sharing in Haskell 

Our solution to the observable sharing problem addresses the prob- 
lem head on. We give specific types the ability to have their shar- 
ing observable, via a reify function w hich translates a tree-like data 
structure into a graph-like data structure, in a type safe manner. We 
use the class type system and type functions to allow Haskell pro- 
grammers to provide the necessary hooks for specific data struc- 
tures, typically abstract syntax trees that actually capture abstract 
syntax graphs. 

There are two fundamental issues with giving a type and implemen- 
tation to such a reify function. First, how do we allow a graph to 
share a typed representation with a tree? Second, observable shar- 
ing introduces referential opaqueness, destroying referential trans- 
parency: a key tenet of functional programming. How do we con- 
tain - and reason about - referential opaqueness in Haskell? In 
this section, we introduce our reify function, and honestly admit 
opaqueness by making the reify function an 10 function. 
Graphs in Haskell can be represented using a number of idioms, 
but we use a simple associated list of pairs containing Uniques as 
node names, and node values. 

type Unique = Int 

data BitGraph = BitGraph [ (Unique, BitNode Unique)] 
Unique 

data BitNode s = GraphXor s s 
I GraphDelay s 
I Graphlnput [Bool] 
I GraphVar String 

We parameterize BitNode over the Unique graph "edges", to fa- 
cilitate future generic processors for our nodes. 
Considering the parity example, we might represent the sharing 
using the following expression. 

graph = BitGraph [ (1, GraphXor 2 3) 

, (2, GraphDelay 1) 

, (3, Graphlnput "x") 
] 



This format is a simple and direct net-list representation. If we can 
generate this graph, then using smarter structures like Data. Map 
downstream in a compilation process is straightforward. Given a 
Functor instance for BitNode, we can generically change the 
types of our nodes labels. 

We can now introduce the type of a graph reification function. 

reifyBitGraph : : Bit -> 10 BitGraph 

With this function, and provided we honor any preconditions of its 
use, embedding our /iLava in a way that can have sharing extracted 
is trivial. Of course, the 10 monad is needed. Typically, this reify 
replaces either a parser (which would use 10), or will call another 
10 function later in a pipeline, for example to write out VHDL from 
the BitGraph or display the graph graphically. Though the use of 
10 is not present in all usage models, having 10 does not appear to 
be a handicap to this function. 

4. Generalizing the Reification Function 

We can now generalize reifyBitGraph into our generic graph 
reification function, called reifyGraph. There are three things 
reif yGraph needs to be able to do 

• First, have a target type for the graph representation to use as a 
result. 

• Second, be able to look inside the Haskell value under consid- 
eration, and traverse its structure. 

• Third, be able to build a graph from this traversal. 

We saw all three of these capabilities in our reifyBitGraph ex- 
ample. We can incorporate these ideas, and present our generalized 
graph reification function, reifyGraph. 

reifyGraph : : (MuRef t) 

=> t -> 10 (Graph (DeRef t)) 

The type for reifyGraph says, given the ability to look deep inside 
a structure, provided by the type class MuRef, and the ability to 
derive the shared, inner data type, provided by the type function 
DeRef, we can take a tree of a type that has a MuRef instance, and 
build a graph. 

The Graph data structure is the generalization of BitGraph, with 
nodes of the higher kinded type e, and a single root. 

type Unique = Int 

data Graph e = Graph [ (Unique, e Unique)] 
Unique 

Type functions and associated types (Chakravarty et al. 2005) is a 
recent addition to Haskell. reifyGraph uses a type function to de- 
termine the type of the nodes inside the graph. Associated types al- 
low the introduction of data and type declarations inside a class 
declaration; a very useful addition indeed. This is done by liter- 
ally providing type functions which look like standard Haskell type 
constructors, but instead use the existing class-based overloading 
system to help resolve the function. In our example, we have the 
type class MuRef, and the type function DeRef, giving the follow- 
ing (incomplete) class declaration. 

class MuRef a where 

type DeRef a : : * -> * 
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This class declaration creates a type function DeRef which acts 
like a type synonym inside the class; it does not introduce any con- 
structors or abstraction. The * -> * annotation siives the kind of 
DeRef, meaning it takes two type arguments, the relevant instance 
of MuRef , and another, as yet unseen, argument. DeRef can be as- 
signed to any type of the correct kind, inside each instance. 
In our example above, we want trees of type Bit to be represented 
as a graph of BitNode, so we provide the instance MuRef. 

instance MuRef Bit where 
type DeRef Bit = BitNode 



> reifyGraph $ parity (Name "x") 

Graph [ (l.GraphXor 2 3) 
, (2,GraphDelay 1) 
, (3,GraphInput "x") 
] 



The reifyGraph function is surprisingly general, easy to enable 
via the single instance declaration, and useful in practice. We 
now look at a number of use cases and extensions to reifyGraph, 
before turning to its implementation. 



BitNode is indeed of kind 
function specializes in the 



o the type of oi 



seofBitU 

reifyGraph :: Bit -> 10 (Graph (DeRef Bit)) 

then, because of the type function DeRef, to 

reifyGraph : : Bit -> 10 (Graph BitNode) 

The use of the type function DeRef to find the BitNode data-type 
is critical to tying the input tree to type node representation type, 
though functional dependencies (Jones and Diatchki 2008) could 
also be used here. 

The MuRef class has the following definition. 

class MuRef a where 

type DeRef a : : * -> * 
mapDeRef : : (Applicative f ) 



5. Example: Finite State Machines 

As a simple example, take the problem of describing a state ma- 
chine directly in Haskell. This is easy but tedious because we need 
to enumerate or label the states. Consider this state machine, a 5-7 
convolutional encoder for a viterbi decoder. 




One possible encoding is a step function, which takes input, and 
the current state, and returns the output, and a new state. Assuming 
that we use Boolean to represent 0 and 1, in the input and output, 
we can write the following Haskell. 



• f (DeRef ; 



u) 



mapDeRef allows us, in a generic way, to reach into something 
that has an instance of the MuRef class and recurse over relevant 
children. The first argument is a function that is applied to the 
children, the second is the node under consideration. mapDeRef 
returns a single node, the type of which is determined by the DeRef 
type function, for recording in a graph structure. The result value 
s unique indices, of type u, which were generated by the 
i of the first argument. mapDeRef uses an applicative 
functor (McBride and Patterson 2006) to provide the threading of 
the effect of unique name generation. 

To complete our example, we make Bit an instance of the MuRef 
class, and provide the DeRef and mapDeRef definitions. 

instance MuRef Bit where 
type DeRef Bit = BitNode 

mapDeRef f (Xor a b) = GraphXor <$> f a <*> f b 
mapDeRef f (Delay b) = GraphDelay <$> f b 
mapDeRef f (Input bs) = pure $ Graphlnput bs 
mapDeRef f (Var nm) = pure $ GraphVar nm 

This is a complete definition of the necessary generics to provide 
reifyGraph with the ability to perform type-safe observable shar- 
ing on the type Bit. The form of mapDeRef is regular, and could 
be automatically derived, perhaps using Template Haskell (Sheard 
and Peyton Jones 2002). With this instance in place, we can use our 
general reifyGraph function, to extract our graph. 



data State = ZeroZero I ZeroOne I OneZero I OneOne 

type Input = Bool 

type Output = (Bool, Bool) 



step 



step 
step 



: : Input -> State - 
False ZeroZero = (( 
True ZeroZero = (( 
False ZeroOne 
True ZeroOne 
False OneZero 
True OneZero 
False OneOne 
True OneOne 



(Output .State) 



, ZeroZero) 
, ZeroOne) 
, OneOne) 
, OneZero) 
, ZeroZero) 
, ZeroOne) 
.OneZero) 
( (False , True ) , OneOne) 



( (Tru( 
( (Fal: 
((Fals 
( (True 
( (True 



.False) ,; 
.True ),; 
.True ),( 
.False) ,t 
.True ),; 
.False) ,; 
, False), t 



Arguably more declarative encoding 
state unique identifier. 



data State : 



= State [(i,(o,Stat 



step :: (Eq i) => i -> State i o -> (o.State i o) 
step i (State ts) = (output, st) 

where Just (output, st) = lookup i ts 

stateOO = State [ (False ,( (False .False) , stateOl)), 

(True, ((True ,True) , stateOO))] 

stateOl = State [ (False ,( (True ,True ), statell)), 

(True, ( (False, False) , statelO))] 

statelO = State [ (False ,( (False .True) , stateOO)), 

(True, ((True .False), stateOl))] 

statell = State [ (False ,( (True .False), statelO)), 

(True, ( (False, True) , statell))] 
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Simulating this binding-based stale machine is possible in pure 
Haskell. 

run : : (Eq i) => State i o -> [i] -> [o] 
run st (i:is) = o : run st' is 
where (o,st') = step i st 

Extracting the sharing, for example to allow the display in the graph 
viewing tool dot (Ellson et al. 2003), is not possible in a purely 
functional setting. Extracting the sharing using our reif yGraph 
allows the deeper embedding to be gathered, and other tools can 
manipulate and optimize this graph. 

data StateNode i o s = StateNode [ (i,(o,s)) ] 
deriving Show 

instance MuRef (State i o) where 

type DeRef (State i o) = StateNode i o 
mapDeRef f (State st) = StateNode <$> 
traverse tState st 

tState (b,(o,s)) = (\ s' -> (b,(o,s>))) 
<$> f s 

Here, traverse (from the Traversable class) is a traversal over 
the list type. Now we extract our graph. 

> reifyGraph stateOO 

Graph [(1 .StateNode [(False ,( (False .False) , 2) ) 
, (True, ((True .True) ,1)) 
]) 

, (2, StateNode [(False ,( (True .True) ,3)) 
.(True, ((False, False), 4)) 
]) 

, (3, StateNode [(False ,( (True, False) ,4) ) 
, (True, ((False, True) ,3)) 
]) 

, (4, StateNode [(False ,( (False .True) , 1) ) 
, (True, ((True, False) ,2)) 
]) 

] 



6. Example: Kansas Lava 

At the University of Kansas, we are developing a custom version 
of Lava, for teaching and as a research platform. The intention is 
to allow for higher level abstractions, as supported by the Hawk 
DSL, but also allow the circuit synthesis, as supported by Lava. 
Capturing our Lava DSL in a general manner was the original 
motivation behind revisiting the design decision of using references 
for observable sharing in Chalmers Lava (Claessen 2001). In this 
section, we outline our design of the front end of Kansas Lava, and 
how it uses reifyGraph. 

The principal type in Kansas Lava is Signal, which is a phantom 
type (Leijen and Meijer 1999) abstraction around Wire, the inter- 
nal type of a circuit. 

newtype Signal a = Signal Wire 

newtype Wire = Wire (Entity Wire) 

Entity is a node in our circuit graph, which can represent gate 
level circuits, as well are more complex blocks. 



data Entity s 

= Entity Name [s] — an entity 
I Pad Name — an input pad 

I Lit Integer — a constant 

and2 : : (Signal a, Signal a) -> Signal a 
and2 (Signal wl, Signal w2) 

= Signal $ Wire $ Entity (name "and2") [wl,w2] 



In both Kansas Lava and Chalmers Lava, phantom types are used 
to allow construction of semi-sensible circuits. For example, a 
mux will take a Signal Bool as its input, but switch between 
polymorphic signals. 

mux : : Signal Bool 

-> (Signal a, Signal a) 

mux (Signal s) (Signal wl, Signal w2) 

$ Wire 

$ Entity (name "mux") [s,wl,w2] 

Even though we construct trees of type Signal, we want to ob- 
serve graphs of type Wire, because every Signal is a construc- 
tor wrapper around a tree of Wire. We share the same node data- 
type between our Haskell tree underneath Signal, and inside our 
reified graph. So Entity is parametrized over its inputs, which are 
Wires for our circuit specification tree, and are Unique labels in 
our graph. This allows some reuse of traversals, and we use in- 
stances of the Traversable, Functor and Foldable classes to 
help here. 

Our MuRef instance therefore has the form: 

instance MuRef Wire where 
type DeRef Wire = Entity 
mapDeRef f (Wire s) = traverse f s 

We also define instances for the classes Traversable, Foldable 
and Functor, which are of general usefulness for performing other 
transformations, specifically: 

instance Traversable Entity where 
traverse f (Entity v ss) = Entity v 

<$> traverse f ss 
traverse _ (Pad v) = pure $ Pad v 

traverse _ (Lit i) = pure $ Lit i 

instance Foldable Entity where 

foldMap f (Entity v ss) = foldMap f ss 
foldMap _ (Pad v) = mempty 

foldMap _ (Lit i) = mempty 

instance Functor Entity where 

fmap f (Entity v ss) = Entity v (fmap f ss) 
fmap _ (Pad v) = Pad v 

fmap _ (Lit i) = Lit i 

Now, with our Kansas Lava Hardware specification graph captured 
inside our Graph representation via reifyGraph, we can perform 
simple translations, and pretty print to VHDL, and other targets. 
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7. Comparing reif yGraph and Ref types 



8. Lists, and Other Structures 



Chalmers Lava uses Ref types, which admit pointer equality. The 
interface to Ref types have the following form. 

data Ref a = . . . 

instance Eq (Ref a) 

ref : : a -> Ref a 

deref : : Ref a -> a 

An abstract type Ref can be used to box polymorphic values, via 
the (unsafe) function ref, and Ref admits equality without looking 
at the value inside the box. Ref works by generating a new, unique 
label for each call to ref. So a possible implementation is 

data Ref a = Ref a Unique 
instance Eq (Ref a) where 

(Ref _ ul) == (Ref _ u2) = ul == u2 
ref a = unsaf ePerf ormlO $ do 

u <- newUnique 

return $ Ref a u 
deref (Ref a _) = a 

with the usual caveats associated with the use of unsaf ePerf ormlO. 
To illustrate a use-case, consider a transliteration of Chalmers Lava 
to use the same names as Kansas Lava. We can use a Ref type at 
each node, by changing the type of Wire, and reflecting this change 
into our DSL functions. 



— Transliteration of Chalmers Lava 
newtype Signal s = Signal Wire 

newtype Wire = Wire (Ref (Entity Wire)) 

data Entity s 

= Entity Name [s] 
I ... 

and2 : : Signal a -> Signal a -> Signal a 
and2 (Signal wl) (Signal w2) 

$ Wire 
$ ref 

$ Entity (name "and2") [wl,w2] 
The differences between this definition and the Kansas Lava defi- 



• The type Wire includes an extra Ref indirection; 

• The DSL primitives include an extra ref. 

Wire in Chalmers Lava admits observable sharing directly, while 
Kansas Lava only admits observable sharing using reifyGraph. 
The structure in Kansas Lava can be consumed by an alternative, 
purely functional simulation function, without the possibility of ac- 
cidentally observing sharing. Furthermore, reifyGraph can oper- 
ate over an arbitrary type, and does not need to be wired into the 
datatype. This leaves open a new possibility: observing sharing on 
regular Haskell structures like lists, rose trees, and other structures. 
This is the subject of the next section. 



In the Haskell community, sometimes recursive types are tied using 
a Mu type (Jones 1995). For example, consider a list specified in this 
fashion. 

newtype Mu a = In (a (Mu a)) 

data List a b = Cons a b I Nil 

type MyList a = Mu (List a) 

Now, we can write a list using Cons, Nil, and In for recursion. The 
list [1,2,3] would be represented using the following expression. 

In (Cons 1 (In (Cons 2 (In (Cons 3 (In Nil)))))) 

The generality of the recursion, captured by Mu, allows a general 
instance of Mu for MuRef . Indeed, this is why MuRef is called 
MuRef. 

instance (Traversable a) => MuRef (Mu a) where 
type DeRef (Mu a) = a 
mapDeRef = traverse 

This generality is possible because we are sharing the representa- 
tion between structures. Mu is used to express a tree-like structure, 
where Graph given the same type argument will express a directed 
graph. In order to use MuRef, we need Traversable, and there- 
fore need to provide the instances for Functor, Foldable, and 
Traversable. 

instance Functor (List a) where 
fmap f Nil = Nil 

fmap f (Cons a b) = Cons a (f b) 

instance Foldable (List a) where 
foldMap f Nil = mempty 

foldMap f (Cons a b) = f b 

instance Traversable (List a) where 

traverse f (Cons a b) = Cons a <$> f b 
traverse f Nil = pure Nil 

Now a list, written using Mu, can have its sharing observed. 

> let xs = In (Cons 99 (In (Cons 100 xs))) 

> reifyGraph xs 
Graph [ (l.Cons 99 2) 

, (2, Cons 100 1) 
] 



The type List is used both for expressing trees and graphs. We can 
reuse List and the instances of List to observe sharing in regular 
Haskell lists. 

instance MuRef [a] where 
type DeRef [a] = List 
mapDeRef f (x:xs) = Cons x <$> f xs 
mapDeRef f [] = pure Nil 

That is, regular Haskell lists are represented as a graph, using List, 
and Mu List lists are also represented as a graph, using List. Now 
we can capture spine-level sharing in our list. 
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> let xs = 99 : 100 : xs 

> reifyGraph xs 
Graph [ (l.Cons 99 2) 

, (2, Cons 100 1) 
] 



There is no way to observe built-in Haskell data structures using 
Ref , which is an advantage of our reify-based observable sharing. 
A list spine, being one dimensional, means that sharing will always 
be represented via back-edges. A tree can have both loops and 
acyclic sharing. One question we can ask is can we capture the 
second level sharing in a list? That is, is it possible we observe the 
difference between 

let x = X 1 in [x,x] and [X 1,X 1] 

using reifyGraph? Alas, no, because the type of the element of a 
list is distinct from the type of the list itself. In the next section, we 
extend reifyGraph to handle nodes of different types inside the 
same reified graph. 

9. Observable Sharing at Different Types 

The nodes of the graph inside the runtime system of Haskell pro- 
grams have many different types. In order to successfully extract 
deeper into our DSL, we want to handle nodes of different types. 
GHC Haskell already provides the Dynamic type, which is a com- 
mon type for using with collections of values of different types. 
The operations are 

data Dynamic = . . . 

toDyn : : Typeable a => a -> Dynamic 

fromDynamic : : Typeable a => Dynamic -> Maybe a 

Dynamic is a monomorphic Haskell object, stored with its type, 
f romDyn succeeds when Dynamic was constructed and extracted 
at the same type. Attempts to use fromDynamic at an incorrect 
type always returns Nothing. The class Typeable is derivable 
automatically, as well as being provided for all built-in types. So 
we have 

> fromDynamic (toDyn "Hello") : : Maybe String 
Just "Hello" 

> fromDynamic (toDyn (1,2)) :: Maybe String 
Nothing 

In this way Dynamic provides a type-safe cast. 
In our extended version of reifyGraph, we require all nodes that 
need to be compared for observational equality to be a member of 
the class Typeable, including the root of our Haskell structure we 
are observing. This gives the type of the extended reifyGraph. 

reifyGraph :: (MuRef s, Typeable s) 

=> s -> 10 (Graph (DeRef s)) 

The trick to reifying nodes of different type into one graph is to 
have a common type for the graph representation. That is, if we 
have a type A and a type B, then we can share a graph that is 
captured to Graph C, provided that DeRef A and DeRef B both 
map to C. We can express this, using the new ~ notation for type 
equivalence. 



Specifically, the type 

example : : (DeRef a " DeRef [a] ) => [a] 
expresses that a and [a] both share the same graph node type. 
In order to observe sharing on nodes of types that are Typeable, 
and share a graph representation type, we refine the type of 
mapDeRef . The refined MuRef class has the following definition. 

class MuRef a where 

type DeRef a : : * -> * 

mapDeRef : : (Applicative f ) 
=> (forall b . 
( MuRef b 
, Typeable b 
, DeRef a ~ DeRef b 
) => b -> f u) 

-> f (DeRef a u) 

mapDeRef has a rank-2 polymorphic functional argument for pro- 
cessing sub-nodes, when walking over a node of type a. This func- 
tional argument requires that 

• The sub-node be a member of the class MuRef; 

• The sub-node be Typeable, so that we can use Dynamic inter- 

• Finally, the graph representation of the a node and the graph 
representation of the b node are the same type. 

We can use this version of MuRef to capture sharing at different 
types. For example, consider the structure 

let xs = [1. .3] 

in cycle [xs,ys,tail ys] 

There are three types inside this structure, [[Int]], [Int], and 
Int. This means we need two instances, one for lists with element 
types that can be reified, and one for Int, and a common data-type 
to represent the graph nodes. 

data Node u = Cons u u 
I Nil 
I Int Int 

instance ( Typeable a 
, MuRef a 

, DeRef [a] ~ DeRef a) => MuRef [a] where 
type DeRef [a] = Node 

mapDeRef f (x:xs) = Cons <$> f x <*> f xs 
mapDeRef f [] = pure Nil 

instance MuRef Int where 
type DeRef Int = Node 

mapDeRef f n = pure $ Int n 

The Node type is our reified graph node structure, with three pos- 
sible constructors, Cons and Nil for lists (of type [Int] or type 
[ [Int] ] ), and Int which represents an Int. 
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Int 0 Cons 

/ \ 
Int 1 Cons 

/ \ 
Int 2 Cons 




Int 3 Nil 



Figure 1. Sharing within structures of different types 



Reifying the example above now succeeds, giving 

> reifyGraph (let xs = [1..3] 

> ys = 0 : xs 

> in cycle [xs.ys.tail ys] ) 
Graph [ (l.Cons 2 9) 

, (9, Cons 10 12) 

, (12, Cons 2 1) 

, (10, Cons 11 2) 

, (11, Int 0) 

, (2, Cons 3 4) 

, (4, Cons 5 6) 

, (6, Cons 7 8) 

, (8, Nil) 

, (7, Int 3) 

, (5, Int 2) 

, (3, Int 1) 
] 



Figure 1 renders this graph, showing we have successfully captured 
the sharing at multiple levels. 

10. Observing Functions 

Given we can observe structures with distinct node types, can we 
use the same machinery to observe functions? It turns out we can! 
A traditional way of observing functions is to apply a function to a 
dummy argument, and observe where this dummy argument occurs 
inside the result expression. At first, it seems that an exception can 
be used for this, but there is a critical shortcoming. It is impossible 
to distinguish between the use of a dummy argument in a sound 
way and examining the argument. For example 

\ x -> (l,[l..x]) 
gives the same result as 

\ x -> (l,x) 
when x is bound to an exception-raising thunk. 



We can instead use the type class system, again, to help us. 

class NewVar a where 
mkVar : : Dynamic -> a 

Now, we can write a function that takes a function and returns the 
function argument and result as a tuple. 

capture :: (Typeable a, Typeable b, NewVar a) 

=> (a -> b) -> (a,b) 
capture f = (a,f a) 

where a = mkVar (toDyn f) 

We use the Dynamic as a unique label (that does not admit equality) 
being passed to mkVar. To illustrate this class being used, consider 
a small DSL for arithmetic, modeled on the ideas for capturing 
arithmetic expressions used in Elliott et al. (2003). 

data Exp = ExpVar Dynamic 
I ExpLit Int 
I ExpAdd Exp Exp 

deriving (Typeable, ...) 

instance NewVar Exp where 
mkVar = ExpVar 

instance Num Exp where 
(+) = ExpAdd 

fromlnteger n = ExpLit (fromlnteger n) 

With these definitions, we can capture our function 

> capture (\ x -> x + 1 : : Exp) 

(ExpVar ExpAdd (ExpVar ...) (ExpLit 1)) 

The idea of passing in a explicit ExpVar constructor is an old one, 
and the data-structure used in Elliott et al. (2003) also included a 
ExpVar, but required a threading of a unique String at the point 
a function was being examined. With observable sharing, we can 
observe the sharing that is present inside the capture function, 
and reify our function without needing these unique names, 
capture gives a simple mechanism for looking at functions, but 
not functions inside data-structures we are observing for sharing. 
We want to add the capture mechanism to our multi-type reifica- 
tion, using a Lambda constructor in the graph node data-type. 

instance ( MuRef a, Typeable a, NewVar a, 
MuRef b, Typeable b, 
DeRef a ~ DeRef (a -> b) , 
DeRef b ~ DeRef (a -> b) ) 
=> MuRef (a -> b) where 
type DeRef (a -> b) = Node 
mapDeRef f fn = let v = mkVar $ toDyn fn 

in Lambda <$> f v <*> f (fn v) 

This is quite a mouthful! For functions of type a -> b, we need 
a to admit MuRef (have observable sharing), Typeable (because 
we are working in the multi-type observation version), and NewVar 
(because we want to observe the function). We need b to admit 
MuRef and Typeable. We also need a, b and a -> b to all share a 
common graph data-type. When observing a graph with a function, 
we are actually observing the sharing created by the let v = . . . 
inside the mapDeRef definition. 
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Cons 




Var Lambda Cons 




Var Int 1 Var Int 9 



Figure 2. Sharing within structures and functions 



We need to add our MuRef instance for Exp, so we can observe 
structures of the type Exp. 

data Node u = . . . I Lambda u u I Var I Add u u 

instance MuRef Exp where 
type DeRef Exp = Node 

mapDeRef f (ExpVar _) = pure Var 
mapDeRef f (ExpLit i) = pure $ Int i 
mapDeRef f (ExpAdd x y) = Add <$> f x <*> f y 

Finally, we can observe functions in the wild! 

> reifyGraph (let t = [ \ x -> x : : Exp 

> , \ x -> x + 1 

> , \ x -> head t 9 ] 

> in t) 
Graph [ (l.Cons 2 4) 

, (4, Cons 5 9) 

, (9, Cons 10 13) 

, (13,Nil) 

, (10, Lambda 11 12) 

, (12, Int 9) 

, (11, Var) 

, (5, Lambda 6 7) 

, (7, Add 6 8) 

, (8, Int 1) 

, (6, Var) 

, (2, Lambda 3 3) 

, (3, Var) 

] 



Figure 2 shows the connected graph that this reification produced. 
The left hand edge exiting Lambda is the argument, and the right 
hand edge is the expression. 

In Elliott et al. (2003), an expression DSL like our example here 
was used to synthesize and manipulate infinite, continuous images. 
The DSL generated C code, allowing real time manipulation of 
image parameters. In Elliott (2004), a similar expression DSL was 
used to generate shader assembly rendering code plus C# GUI 
code. A crucial piece of technology needed to make both these 
implementations viable was a common sub-expression eliminator, 
to recover lost sharing. We recover the important common sub- 
expressions for the small cost of observing sharing from within an 
ID function. 



11. Implementation of reifyGraph 

In this section, we present our implementation of reifyGraph. The 
implementation is short, and we include it in the appendix. 
We provide two implementations of reifyGraph in the hackage 
library data-reify. The first implementation of reifyGraph is a 
depth-first walk over a tree at single type, to discover structure, 
storing this in a list. A second implementation also performs a 
depth-first walk, but can observe sharing of a predetermined set of 
types, provided they map to a common node type in the final graph. 
One surprise is that we can implement our flexible observable 
sharing functions in just a few lines of GHC Haskell. We use 
the StableName abstraction, as introduced in Peyton Jones et al. 
(1999), to provide our basic (typed) pointer equality, and the re- 
mainder of our implementation is straightforward Haskell program- 
ing. 

Stable names are supplied in the library System . Mem . StableName, 
to allow pointer equality, provided the objects have been declared 
comparable inside an 10 operation. The interface is small. 

data StableName a 

makeStableName : : a -> 10 (StableName a) 
hashStableName : : StableName a -> Int 
instance Eq (StableName a) 

If you are inside the IO monad, you can make a StableName 
from any object, and the type StableName admits Eq without 
looking at the original object. StableNames can be thought of as 
a pointer, and the Eq instance as pointer equality on these pointers. 
Finally, the hashStableName facilitates a lookup table 
StableNames, and is stable over garbage collection. 
We use stable names to keep a list of already visited nodes. Our 
graph capture is the classical depth first search over the graph, 
and does not recurse over nodes that we have already visited. 
reifyGraph is implemented as follows. 

• We initialize two tables, one that maps StableNames (at the 
same type) to Uniques, and a list that maps Uniques to 
edges in our final node type. In the first table, we use the 
hashStableName facility of StableNames to improve the 
lookup time. 

• We then call a recursive graph walking function f indNodes 
with the two tables stored inside MVars. 

• We then return the second table, and the Unique 

Inside f indNodes, for a specific node, we 

• Perform seq on this node, to make sure this node is evaluated. 

• If we have seen this node before, we immediately return the 
Unique that is associated with this node. 

• We then allocate a new Unique, and store it in our first MVar 
table, using the StableName of this node as the key. 

• We use mapDeRef to recurse over the children of this node. 

• This returns a new node of type "DeRef s Unique", where s is 
the type we are recursing over, and DeRef is our type function. 

• We store the pair of the allocated unique and the value returned 
by mapDeRef in a list. This list will become our graph. 

• We then return the Unique associated with this node. 

It should be noted that the act of extracting the graph performs like 
a deep seq, being hyperstrict on the structure under consideration. 
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The Dynamic version of reifyGraph is similar to the standard 
reif yGraph. The first table contains Dynamics, not StableNames, 
and when considering a node for equality, the fromDynamic is 
called at the current node type. If the node is of the same type as 
the object inside the Dynamic, then the StableName equality is 
used to determine point equality. If the node is of a different type 
(fromDynamic returns Nothing), then the pointer equality fails by 
definition. 

One shortcoming with the Dynamic implementation is the obscure 
error messages. If an instance is missing, this terse message is 
generated. 

Top level: 

Couldn't match expected type 'Node' 
against inferred type 'DeRef t' 

This is stating that the common type of the final Graph was ex- 
pected, and for some structure was not found, but does not state 
which one was not found. It would be nice if we could somehow 
parameterize the error messages or augment them with a secondary 
message. 

12. Reflections on Observable Sharing 

In this section, we consider both the correctness and consequences 
of observable sharing. The correctness of reifyGraph depends on 
the correctness of StableNames. Furthermore, observing the heap, 
even from within an 10 function, has consequences for the validity 
of equational reasoning and the laws that can be assumed. 
In the System . Mem . StableName library, stable names are defined 
as providing "a way of performing fast [...], not-quite-exact com- 
parison between objects." Specifically, the only requirement on sta- 
ble names is that if two stable names are equal, then "[both] were 
created by calls to makeStableName on the same object." This is a 
property that could be trivially satisfied by simply defining equality 
over stable names as False! 

The intent of stable names is to implement the behavior of pointer 
equality on heap representations, while allowing the heap to use ef- 
ficient encodings. In reality, the interface does detect sharing, with 
the advertised caveat that an object before and after evaluation may 
not generate stable names that are equal. In our implementation, we 
use the seq function to force evaluation of each graph node under 
observation, just before generating stable names, and this has been 
found to reliably detect the sharing we expect. It is unsettling, how- 
ever, that we do not (yet) have a semantics of when we can and can 
not depend on stable names to observe sharing. 
An alternative to using stable names would be to directly examine 
the heap representations. Vacuum (Morrow) is a Haskell library for 
extracting heap representations, which gives a literal view of the 
heap world, and has been successfully used to both capture and 
visualize sharing inside Haskell structures. Vacuum has the ability 
to generate dot graphs for observation and does not require that a 
graph be evaluated before being observed. 

Vacuum and reifyGraph have complementary roles. Vacuum al- 
lows the user to see a snapshot of (he rcal-linic heap without neces- 
sarily changing it, while reifyGraph provides a higher level inter- 
face, by forcing evaluation on a specific structure, and then observ- 
ing sharing on the same structure. Furthermore reifyGraph does 
not require the user to understand low-level representations to ob- 
serve sharing. It would certainly be possible to build reifyGraph 
on top of Vacuum. 

Assuming a reliable observation of sharing inside reifyGraph, 
what are the consequences to the Haskell programmer? Claessen 



and Sands (1999) argue that little is lost in the presence of observ- 
able sharing in a call-by-name lazy functional language, and also 
observe that all Haskell implementations use a call-by-name evalu- 
ation strategy, even though the Haskell report (Peyton Jones 2003) 
does not require this. In Haskell let-/3, a variant of /3-reduction, 

let {x = M} in AT = N[ M / X ] {x £ M) (1) 

Over structural values, this equality is used with caution inside 
Haskell compilers, in either direction. To duplicate the construc- 
tion of a structure is duplicating work, and can change the time 
complexity of a program. To common up construction (using (1) 
from right to left) is also problematic because this can be detrimen- 
tal to the space complexity of a program. 

It is easy in Haskell to lose sharing, even without using (1). Con- 
sider one of the map laws. 

map id M = M (2) 

Any structure that the spine of 'M' has is lost in 'map id M\ 
Interestingly, this loss of sharing in map is not mandated, and a 
version of map using memoization could preserve the sharing. This 
is never done because we can not depend on - or observe - sharing. 
One place where GHC introduces unexpected sharing is when 
generating overloaded literals. In Kansas Lava, the term 9 + 9 
unexpectedly shares the same node for the value 9. 

> reifyGraph (9 + 9) 
Graph [ (1 .Entity + [2,2]) 

, (2, Entity fromlnteger [3]) 

, (3, Lit 9) 

] 



Literal values are like enumerated constructors, and any user of 
reifyGraph must allow for the possibility of such literals being 

What does all this mean? We can have unexpected sharing of 
constants, as well as lose sharing by applying what we considered 
to be equality holding transformations. 
The basic guidelines for using reif yData are 

• Observe only structures built syntactically. Combinators in our 
DSLs are lazy in their (observed) arguments, and we do not 
deconstruct the observed structure before reif yData. 

• Assume constants and enumerated constructors may be shared, 
even if syntactically they are not the same expression. 

There is a final guideline when using observable sharing, which is 
to allow a DSL to have some type of (perhaps informal) let-/3 rule. 
In the same manner as rule (1) in Haskell should only change how 
fast some things run and not the final outcome, interpreters using 
observable sharing should endeavor to use sharing to influence per- 
formance, not outcome. For example, in Lava, undetected acyclic 
sharing in a graph would result in extra circuitry and the same re- 
sults being computed at a much greater cost. Even for undetected 
loops in well-formed Lava circuits, it is possible to generate circuits 
that work for a preset finite number of cycles. 
If this guideline is followed literally, applying (1) and other equa- 
tional reasoning techniques to DSLs that use observable sharing is 
now a familiar task for a functional programer, because applying 
equational reasoning changes performance, not the final result. A 
sensible let-/? rule might not be possible for all DSLs, but it pro- 
vides a useful rule of thumb to influence the design. 
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13. Performance Measurements 



We performed some basic perfon 
reifyGraph function. We ran a small number of tests observ- 
ing the sharing in a binary tree, both with and without sharing, on 
both the original and Dynamic reifyGraph. Each extra level on 
the graph introduces double the number of nodes. 

Tree Original Dynamic 

Depth Sharing No Sharing Sharing No Sharing 

16 0.100s 0.154s 0.147s 0.207s 

17 0.237s 0.416s 0.343s 0.519s 

18 0.718s 1.704s 0.909s 2.259s 

19 2.471s 7.196s 2.845s 8.244s 

20 11.140s 25.707s 13.377s 32.443s 



While reifyGraph is not linear, we can handle 2 20 (around a 
million) nodes in a few seconds. 



14. Conclusions and Further Work 

We have introduced an 10 based solution to observable sharing that 
uses type functions to provide type-safe observable sharing. The 
use of 10 is not a hinderance in practice, because the occasions we 
want to observe sharing are typically the same occasions as when 
we want to export a net-list like structure to other tools. 
Our hope is that the simplicity of the interface and the familiar- 
ity with the ramifications of using an 10 function will lead to 
reifyGraph being used for observable sharing in deep DSLs. 
We need a semantics for reifyGraph. This of course will involve 
giving at least a partial semantics to 10, for the way it is being used. 
One possibility is to model the StableName equality as a non- 
dctcrministic choice, where IO provides a True/False oracle. This 
would mean that reifyGraph would actually return an infinite tree 
of possible graphs, one for each possible permutation of answers 
to the pointer equality. Another approach we are considering is to 
extend Natural Semantics (Launchbury 1993) for a core functional 
language with a reify primitive, and compare it with the st 
for Ref -based observable sharing (Claessen and Sands 1999). 
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A. Implementation 



import Data. Unique 



E Undecidablelnstances, TypeFamilies #-} 
module Data. Reify 

( MuRef (..), module Data. Reify. Graph, reifyGraph 



{-# LANGUAGE Undecidablelnstances, TypeFamilies, 
RankNTypes, ExistentialQuantif ication, 
DeriveDataTypeable , RelaxedPolyRec , 
FlexibleContexts #-} 

module Data. Dynamic. Reify 

( MuRef(..), module Data. Reify. Graph, reifyGraph 

class MuRef a where 

type DeRef a : : * -> * 
mapDeRef :: (Applicative f) => 

(forall b . (MuRef b, Typeable b, 
DeRef a ~ DeRef b) 



import Control .Concurrent .MVar 

import Control .Monad 

import System. Mem. StableName 

import Data.IntMap as M 

import Control. Applicative 

import Data. Reify. Graph 



cl. 



MuRef a wh 
type DeRef a : 
mapDeRef : : (Applic 
=> (a 



0 U) " 



1 u) 



reifyGraph : : (MuRef s) => s -> 10 (Graph (DeRef s 
reifyGraph m = do rtl <- newMVar M. empty 
rt2 <- newMVar [] 
uVar <- newMVar 0 
root <- findNodes rtl rt2 uVar n 
pairs <- readMVar rt2 
return (Graph pairs root) 



: (MuRef s) 

> MVar (IntMap [(StableName 

> MVar [(Int, DeRef s Int)] 

> MVar Int 



st <- makeStableName j 
tab <- takeMVar rtl 
case mylookup st tab of 
Just var -> do putMVar 



-> f (DeRef a u) 

reifyGraph :: (MuRef s, Typeable s) 

=> s -> 10 (Graph (DeRef s)) 
reifyGraph m = do rtl <- newMVar M. empty 

rt2 <- newMVar [] 

uVar <- newMVar 0 

root <- findNodes rtl rt2 uVar m 

pairs <- readMVar rt2 

return (Graph pairs root) 

findNodes :: (MuRef s, Typeable s) 

=> MVar (IntMap [(Dynamic , Int)] ) 
-> MVar [(Int, DeRef s Int)] 
-> MVar Int 

-> 10 Int 

findNodes rtl rt2 uVar j I j 'seq' True = do 
st <- makeStableName j 
tab <- takeMVar rtl 
case mylookup st tab of 

Just var -> do putMVar rtl tab 

Nothing -> do var <- newUnique uVar 

putMVar rtl $ M.insertWith (+4 
(hashStableName st) 
[(toDyn st.var)] 



- takeMVar rt2 



mylookup h tab = 

case M. lookup (hashStableName h) tab of 
Just tab2 -> Prelude . lookup h tab2 
Nothing -> Nothing 



newUnique : : MVar 
newUnique var = c 
v <- takeMVar v 



mylookup : : (Typeable a) 
=> StableName a 
-> IntMap [(Dynamic, Int)] 
-> Maybe Int 
mylookup h tab = 

case M. lookup (hashStableName h) tab of 
Just tab2 -> Prelude . lookup (Just h) 
[ (fromDynamic c,u) 
I (c,u) <- tab2 ] 
Nothing -> Nothing 



newUnique : : MVaj 
newUnique var = c 
v <- takeMVar i 



) Int 
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